Peter Norvig on Neural Networks - Dictionary of Arguments

Author	Concept	Summary/Quotes	Sources
Psychology Dictionary of Arguments Home


Neural networks: Neural networks are computational models inspired by the human brain, designed to recognize patterns and solve complex problems. They consist of layers of interconnected nodes (analogous to neurons) that process input data and learn to perform tasks by adjusting the strength of connections based on feedback. Used extensively in machine learning, they enable applications like image recognition, language processing, and predictive analysis. See also Artificial Neural networks, Connectionism, Computer models, Computation, Artificial Intelligence, Machine learning. _____________ Annotation: The above characterizations of concepts are neither definitions nor exhausting presentations of problems related to them. Instead, they are intended to give a short introduction to the contributions below. – Lexicon of Arguments.

> Norvig, Peter	> Neural Networks	Peter Norvig on Neural Networks - Dictionary of Arguments Norvig I 761 Neural Networks/Norvig/Russell: Literature on neural networks: Cowan and Sharp (1988b⁽¹⁾, 1988a⁽²⁾) survey the early history, beginning with the work of McCulloch and Pitts (1943)(3). John McCarthy has pointed to the work of Nicolas Rashevsky (1936⁽⁴⁾, 1938⁽⁵⁾) as the earliest mathematical model of neural learning.) Norbert Wiener, a pioneer of cybernetics and control theory (Wiener, 1948)⁽⁶⁾, worked with McCulloch and Pitts and influenced a number of young researchers including Marvin Minsky, who may have been the first to develop a working neural network in hardware in 1951 (see Minsky and Papert, 1988⁽⁷⁾, pp. ix–x). Turing (1948)⁽⁸⁾ wrote a research report titled Intelligent Machinery that begins with the sentence “I propose to investigate the question as to whether it is possible for machinery to show intelligent behaviour” and goes on to describe a recurrent neural network architecture he called “B-type unorganized machines” and an approach to training them. Unfortunately, the report went unpublished until 1969, and was all but ignored until recently. Frank Rosenblatt (1957)⁽⁹⁾ invented the modern “perceptron” and proved the perceptron convergence theorem (1960), although it had been foreshadowed by purely mathematical work outside the context of neural networks (Agmon, 1954⁽¹⁰⁾; Motzkin and Schoenberg, 1954⁽¹¹⁾). Some early work was also done on multilayer networks, including Gamba perceptrons (Gamba et al., 1961)⁽¹²⁾ and madalines (Widrow, 1962)⁽¹³⁾. Learning Machines (Nilsson, 1965)⁽¹⁴⁾ covers much of this early work and more. The subsequent demise of early perceptron research efforts was hastened (or, the authors later claimed, merely explained) by the book Perceptrons (Minsky and Papert, 1969)⁽¹⁵⁾, which lamented the field’s lack of mathematical rigor. The book pointed out that single-layer perceptrons could represent only linearly separable concepts and noted the lack of effective learning algorithms for multilayer networks. The papers in (Hinton and Anderson, 1981)⁽¹⁶⁾, based on a conference in San Diego in 1979, can be regarded as marking a renaissance of connectionism. The two-volume “PDP” (Parallel Distributed Processing) anthology (Rumelhart et al., 1986a)⁽¹⁷⁾ and a short article in Nature (Rumelhart et al., 1986b)⁽¹⁸⁾ attracted a great deal of attention—indeed, the number of papers on “neural networks” multiplied by a factor of 200 between 1980–84 and 1990–94. The analysis of neural networks using the physical theory of magnetic spin glasses (Amit et al., 1985)⁽¹⁹⁾ tightened the links between statistical mechanics and neural network theory - providing not only useful mathematical insights but also respectability. The back-propagation technique had been invented quite early (Bryson and Ho, 1969)⁽²⁰⁾ but it was rediscovered several times (Werbos, 1974⁽²¹⁾; Parker, 1985⁽²²⁾). The probabilistic interpretation of neural networks has several sources, including Baum and Wilczek (1988)⁽²³⁾ and Bridle (1990)⁽²⁴⁾. The role of the sigmoid function is discussed by Jordan (1995)⁽²⁵⁾. Bayesian parameter learning for neural networks was proposed by MacKay Norvig I 762 (1992)⁽²⁶⁾ and is explored further by Neal (1996)⁽²⁷⁾. The capacity of neural networks to represent functions was investigated by Cybenko (1988⁽²⁸⁾, 1989⁽²⁹⁾), who showed that two hidden layers are enough to represent any function and a single layer is enough to represent any continuous function. The “optimal brain damage” method (>Artificial neural networks/Norvig) for removing useless connections is by LeCun et al. (1989)⁽³⁰⁾, and Sietsma and Dow (1988)⁽³¹⁾ show how to remove useless units. >Complexity/Norvig. Norvig I 763 For neural nets, Bishop (1995)⁽³²⁾, Ripley (1996)⁽³³⁾, and Haykin (2008)⁽³⁴⁾ are the leading texts. The field of computational neuroscience is covered by Dayan and Abbott (2001)⁽³⁵⁾. 1. Cowan, J. D. and Sharp, D. H. (1988b). Neural nets and artificial intelligence. Daedalus, 117, 85–121. 2. Cowan, J. D. and Sharp, D. H. (1988a). Neural nets. Quarterly Reviews of Biophysics, 21, 365–427. 3. McCulloch, W. S. and Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysics, 5, 115–137. 4. Rashevsky, N. (1936). Physico-mathematical aspects of excitation and conduction in nerves. In Cold Springs Harbor Symposia on Quantitative Biology. IV: Excitation Phenomena, pp. 90–97. 5. Rashevsky, N. (1938). Mathematical Biophysics: Physico-Mathematical Foundations of Biology. University of Chicago Press. 6. Wiener, N. (1948). Cybernetics. Wiley. 7. Minsky, M. L. and Papert, S. (1988). Perceptrons: An Introduction to Computational Geometry (Expanded edition). MIT Press. 8. Turing, A. (1948). Intelligent machinery. Tech. rep. National Physical Laboratory. reprinted in (Ince, 1992). 9. Rosenblatt, F. (1957). The perceptron: A perceiving and recognizing automaton. Report 85-460-1, Project PARA, Cornell Aeronautical Laboratory. 10. Agmon, S. (1954). The relaxation method for linear inequalities. Canadian Journal of Mathematics, 6(3), 382–392. 11. Motzkin, T. S. and Schoenberg, I. J. (1954). The elaxation method for linear inequalities. Canadian Journal of Mathematics, 6(3), 393–404. 12. Gamba, A., Gamberini, L., Palmieri, G., and Sanna, R. (1961). Further experiments with PAPA. Nuovo Cimento Supplemento, 20(2), 221–231. 13. Widrow, B. (1962). Generalization and information storage in networks of adaline “neurons”. In Self-Organizing Systems 1962, pp. 435–461. 14. Nilsson, N. J. (1965). Learning Machines: Foundations of Trainable Pattern-Classifying Systems. McGraw-Hill. Republished in 1990. 15. Minsky, M. L. and Papert, S. (1969). Perceptrons: An Introduction to Computational Geometry (first edition). MIT Press. 16. Hinton, G. E. and Anderson, J. A. (1981). Parallel Models of Associative Memory. Lawrence Erlbaum Associates. 17. Rumelhart, D. E., Hinton, G. E., andWilliams, R. J. (1986a). Learning internal representations by error propagation. In Rumelhart, D. E. and McClelland, J. L. (Eds.), Parallel Distributed Processing, Vol. 1, chap. 8, pp. 318–362. MIT Press. 18. Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1986b). Learning representations by back propagating errors. Nature, 323, 533–536. 19. Amit, D., Gutfreund, H., and Sompolinsky, H. (1985). Spin-glass models of neural networks. Physical Review, A 32, 1007–1018. 20. Bryson, A. E. and Ho, Y.-C. (1969). Applied Optimal Control. Blaisdell. 21. Werbos, P. (1974). Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences. Ph.D. thesis, Harvard University. 22. Parker, D. B. (1985). Learning logic. Technical report TR-47, Center for Computational Research in Economics and Management Science, Massachusetts Institute of Technology. 23. Baum, E. and Wilczek, F. (1988). Supervised learning of probability distributions by neural networks. In Anderson, D. Z. (Ed.), Neural Information Processing Systems, pp. 52–61. American Institute of Physics. 24. Bridle, J. S. (1990). Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition. In Fogelman Souli´e, F. and H´erault, J. (Eds.), Neuro computing: Algorithms, Architectures and Applications. Springer-Verlag. 25. Jordan, M. I. (1995). Why the logistic function? a tutorial discussion on probabilities and neural networks. Computational cognitive science technical report 9503, Massachusetts Institute of Technology. 26. MacKay, D. J. C. (1992). A practical Bayesian framework for back-propagation networks. Neural Computation, 4(3), 448–472. 27. Neal, R. (1996). Bayesian Learning for Neural Networks. Springer-Verlag. 28. Cybenko, G. (1988). Continuous valued neural networks with two hidden layers are sufficient. Technical report, Department of Computer Science, Tufts University. 29. Cybenko, G. (1989). Approximation by superpositions of a sigmoidal function. Mathematics of Controls, Signals, and Systems, 2, 303–314. 30. LeCun, Y., Jackel, L., Boser, B., and Denker, J. (1989). Handwritten digit recognition: Applications of neural network chips and automatic learning. IEEE Communications Magazine, 27(11), 41– 46. 31. Sietsma, J. and Dow, R. J. F. (1988). Neural net pruning - Why and how. In IEEE International Conference on Neural Networks, pp. 325–333. 32. Bishop, C. M. (1995). Neural Networks for Pattern Recognition. Oxford University Press. 33. Ripley, B. D. (1996). Pattern Recognition and Neural Networks. Cambridge University Press. 34. Haykin, S. (2008). Neural Networks: A Comprehensive Foundation. Prentice Hall. 35. Dayan, P. and Abbott, L. F. (2001). Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems. MIT Press. _____________ Explanation of symbols: Roman numerals indicate the source, arabic numerals indicate the page number. The corresponding books are indicated on the right hand side. ((s)…): Comment by the sender of the contribution. Translations: Dictionary of Arguments The note [Concept/Author], [Author1]Vs[Author2] or [Author]Vs[term] resp. "problem:"/"solution:", "old:"/"new:" and "thesis:" is an addition from the Dictionary of Arguments. If a German edition is specified, the page numbers refer to this edition.	Norvig I Peter Norvig Stuart J. Russell Artificial Intelligence: A Modern Approach Upper Saddle River, NJ 2010

Send Link

> Counter arguments against Norvig

> Counter arguments in relation to Neural Networks

Authors A B C D E F G H I J K L M N O P Q R S T U V W Z

Concepts A B C D E F G H I J K L M N O P Q R S T U V W Y Z